According to the World Federation of the Deaf, more than two hundred sign languages exist. Therefore, it is challenging to understand deaf individuals, even proficient sign language users, resulting in a barrier between the deaf community and the rest of society. To bridge this language barrier, we propose a novel multilingual communication system, namely MUGCAT, to improve the communication efficiency of sign language users. By converting recognized specific hand gestures into expressive pictures, which is universal usage and language independence, our MUGCAT system significantly helps deaf people convey their thoughts. To overcome the limitation of sign language usage, which is mostly impossible to translate into complete sentences for ordinary people, we propose to reconstruct meaningful sentences from the incomplete translation of sign language. We also measure the semantic similarity of generated sentences with fragmented recognized hand gestures to keep the original meaning. Experimental results show that the proposed system can work in a real-time manner and synthesize exquisite stunning illustrations and meaningful sentences from a few hand gestures of sign language. This proves that our MUGCAT has promising potential in assisting deaf communication.
translated by 谷歌翻译
本文介绍了一场组织的结果,以评估3D手姿势序列中异质手势的在线识别方法的方法。任务是检测属于以不同姿势和运动特征为特征的16个类词典的手势。该数据集具有手跟踪数据的连续序列,其中手势与不显着的动作交织在一起。在现实的混合现实交互用例中,使用HoloLens 2手指跟踪系统捕获了数据。评估不仅基于检测性能,还基于延迟和误报,使您可以根据提出的算法了解实际交互工具的可行性。比赛评估的结果表明需要进一步研究以减少识别错误,而所提出的算法的计算成本足够低。
translated by 谷歌翻译
基于草图的3D形状检索(SBSR)是一项重要但艰巨的任务,近年来引起了越来越多的关注。现有方法在限制设置中解决了该问题,而无需适当模拟真实的应用程序方案。为了模仿现实的设置,在此曲目中,我们采用了不同级别的绘图技能的业余爱好者以及各种3D形状的大规模草图,不仅包括CAD型号,而且还可以从真实对象扫描的模型。我们定义了两个SBSR任务,并构建了两个基准,包括46,000多个CAD型号,1,700个现实型号和145,000个草图。四个团队参加了这一轨道,并为这两个任务提交了15次跑步,由7个常用指标评估。我们希望,基准,比较结果和开源评估法会在3D对象检索社区中促进未来的研究。
translated by 谷歌翻译
本文介绍了提交给SHREC 2022坑道轨道和路面裂纹检测的方法。总共比较了道路表面的语义分割的7种不同的运行,参与者和基线方法的6个。所有方法都利用深度学习技术及其性能使用相同的环境(即:单个Jupyter笔记本)进行测试。由3836个语义细分图像/蒙版对组成的培训集和797个带有最新深度摄像机的RGB-D视频片段组成。然后,在验证集中的496个图像/掩码对上,测试集中的504对,最后在8个视频剪辑上评估该方法。结果的分析基于用于图像分割和视频剪辑定性分析的定量指标。参与和结果表明,该方案引起了人们的极大兴趣,在这种情况下,使用RGB-D数据仍然具有挑战性。
translated by 谷歌翻译
虽然磁共振成像(MRI)在婴儿脑分析中发挥了重要作用,但是将MRI分段为许多组织,例如灰质(GM),白质(WM)和脑脊液(CSF)是至关重要的,并且由于组织之间的极低强度对比度在6-9个月的年龄约6-9个月之间以及扩增的噪声,髓鞘,不完全体积。在本文中,我们通过开发一个名为Dam-al的新的深层学习模型来解决这些限制,其中包含两个主要贡献,即扩张注意机制和难以案例的注意力。我们的Dam-Al网络设计有跳过块层和焦点卷积。它在低级空间结构特征下,它在高级上下文特征和空间注意中包含通道。我们的注意力损失由与地区信息和硬样品对应的两个术语组成。我们拟议的Dam-Al已经在婴儿脑ISEG 2017数据集上进行了评估,并且在验证和测试集中进行了实验。我们在骰子系数和ASD指标上进行了基准测试了Dam-AL,并将其与最先进的方法进行了比较。
translated by 谷歌翻译
近年来对目标细分研究有了很大的进步。除了通用物体外,水生动物也引起了研究的关注。基于深度学习的方法广泛用于水生动物细分,并取得了有希望的表现。但是,缺乏基准测试的具有挑战性的数据集。因此,我们创建了一个被称为“水生动物物种”的新数据集。此外,我们设计了一种新的基于多模式的场景感知分段框架,其利用多个视图分段模型的优点,以有效地分段为水生动物的图像。为了提高培训表现,我们开发了一个引导的混合增强方法。广泛的实验比较了具有最先进的实例分段方法的提出框架的性能,证明了我们的方法是有效的,并且它显着优于现有方法。
translated by 谷歌翻译
本文推动了在图像中分解伪装区域的信封,成了有意义的组件,即伪装的实例。为了促进伪装实例分割的新任务,我们将在数量和多样性方面引入DataSet被称为Camo ++,该数据集被称为Camo ++。新数据集基本上增加了具有分层像素 - 明智的地面真理的图像的数量。我们还为伪装实例分割任务提供了一个基准套件。特别是,我们在各种场景中对新构造的凸轮++数据集进行了广泛的评估。我们还提出了一种伪装融合学习(CFL)伪装实例分割框架,以进一步提高最先进的方法的性能。数据集,模型,评估套件和基准测试将在我们的项目页面上公开提供:https://sites.google.com/view/ltnghia/research/camo_plus_plus
translated by 谷歌翻译
基于流量的生成模型最近已成为模拟数据生成的最有效方法之一。实际上,它们是由一系列可逆和可触觉转换构建的。Glow首先使用可逆$ 1 \ times 1 $卷积引入了一种简单的生成流。但是,与标准卷积相比,$ 1 \ times 1 $卷积的灵活性有限。在本文中,我们提出了一种新颖的可逆$ n \ times n $卷积方法,该方法克服了可逆$ 1 \ times 1 $卷积的局限性。此外,我们所提出的网络不仅可以处理和可逆,而且比标准卷积使用的参数少。CIFAR-10,ImageNet和Celeb-HQ数据集的实验表明,我们可逆的$ N \ times n $卷积有助于显着提高生成模型的性能。
translated by 谷歌翻译
In this paper, we propose a novel technique, namely INVALIDATOR, to automatically assess the correctness of APR-generated patches via semantic and syntactic reasoning. INVALIDATOR reasons about program semantic via program invariants while it also captures program syntax via language semantic learned from large code corpus using the pre-trained language model. Given a buggy program and the developer-patched program, INVALIDATOR infers likely invariants on both programs. Then, INVALIDATOR determines that a APR-generated patch overfits if: (1) it violates correct specifications or (2) maintains errors behaviors of the original buggy program. In case our approach fails to determine an overfitting patch based on invariants, INVALIDATOR utilizes a trained model from labeled patches to assess patch correctness based on program syntax. The benefit of INVALIDATOR is three-fold. First, INVALIDATOR is able to leverage both semantic and syntactic reasoning to enhance its discriminant capability. Second, INVALIDATOR does not require new test cases to be generated but instead only relies on the current test suite and uses invariant inference to generalize the behaviors of a program. Third, INVALIDATOR is fully automated. We have conducted our experiments on a dataset of 885 patches generated on real-world programs in Defects4J. Experiment results show that INVALIDATOR correctly classified 79% overfitting patches, accounting for 23% more overfitting patches being detected by the best baseline. INVALIDATOR also substantially outperforms the best baselines by 14% and 19% in terms of Accuracy and F-Measure, respectively.
translated by 谷歌翻译
Modern deep neural networks have achieved superhuman performance in tasks from image classification to game play. Surprisingly, these various complex systems with massive amounts of parameters exhibit the same remarkable structural properties in their last-layer features and classifiers across canonical datasets. This phenomenon is known as "Neural Collapse," and it was discovered empirically by Papyan et al. \cite{Papyan20}. Recent papers have theoretically shown the global solutions to the training network problem under a simplified "unconstrained feature model" exhibiting this phenomenon. We take a step further and prove the Neural Collapse occurrence for deep linear network for the popular mean squared error (MSE) and cross entropy (CE) loss. Furthermore, we extend our research to imbalanced data for MSE loss and present the first geometric analysis for Neural Collapse under this setting.
translated by 谷歌翻译